Can we model successful menstrual data to individualise training in women?
The availability of mobile apps developed to track the menstrual cycle is growing as they are becoming increasingly popular for contraception purposes, fertility awareness and exercise planning. The ability to predict an individual’s menstrual cycle length to a high degree of precision could help female athletes to track their period and tailor their training and nutrition correspondingly.
As few apps are accurate in terms of menstrual cycle length prediction, the development of an appropriate,exact parametric model for one-step-ahead forecast cycle length is required. Such a model should take intoaccount the between and within-woman variability to identify menstrual cycle patterns and how each symptomcould affect cycle length, alongside the implications of significant alterations in cycle length.
According to several studies, the menstrual cycle length can be classified into two groups, ‘standard‘ and‘menstrual dysfunction‘, where a cycle length greater than 35 days is classified as ‘menstrual dysfunction‘ andotherwise as standard. Many statistical models have been proposed in the literature to describe these differentgroups of menstrual cycles. Generally, cycle length related to the ‘standard‘ group can be analysed usingclassical statistical approaches. In contrast, the mixture of standard and non-standard cycles can be analysed using a mixture distribution accounting for the significant symmetric distribution and the component corresponding to the heavy right tail.
To account for the within-individual variability, the authors of this study focused on the dynamic aspect ofmenstrual cycles over time, where a predictive distribution is derived based on individual repeated measurements using a state-space model formulation. The state-space models under a Bayesian approach have the advantage of incorporating between subject information to compensate for the relatively large number of subjects with a low quantity of repeated measurements and to make predictions for women not included in the sample.
In this paper, the first objective was to develop an appropriate parametric state-space formulation for themarginal distribution of standard menstrual cycles for female athletes. In addition, symptom variables wereincluded in the model’s linear predictor to evaluate how the individual reported symptoms might affect an athlete’s menstrual cycle duration. The second aim was to develop a one-step-ahead forecasting interval approach, based on a state-space formulation, to describe the experimental and state process while considering both between and within-woman variability.
To achieve this, a hybrid predictive model was built using data on 16,524 cycles collected from a sample of 2125 women (mean age 34.38 years (range 18.00–47.10), mean weight 62.75 kg (range 42.18 to 100.23 Kg); mean height 165.88 cm (range 152.4 to 186.0 cm), number of menstrual cycles ranging from 4 to 53).
A mixed-effect state-space model was fitted to capture the within-subject temporal correlation. The authorsassumed that cycle lengths are independent and that menstrual cycles tend to decrease over time as a woman ages. In addition, they combined the Bayesian approach and forecasting processes to include covariates where model validation procedures were used to compare model adequacy.
The modelling procedure was split into three steps (1) a time trend component using a random walk with an overdispersion parameter, (2) an autocorrelation component using an autoregressive moving-average model, and (3) a linear predictor to account for covariates (e.g. injury, stomach cramps, training intensity).
The first step was to account for a possible trend, by identifying the most appropriate error structure for themodel, which consisted of a comparison of a random walk model or a linear mixed effect model. The second step involved the inclusion of temporal dependence among observations, as evident in some women in the sample. The third step involved the inclusion of explanatory variables to account for their (possible) relationship with cycle length.
The complete sample of 2125 women was used for model validation by treating the last observed cycle length as test data.
The overall menstrual cycle length without any reported symptoms was around 27.41 days (range 27.33 to 27.50 days).
The inclusion of an overdispersion parameter suggested that 26.36% (range 23.68% to 29.17%) of cycles in the sample were overdispersed. Furthermore, while a non-overdispersed cycle had a standard deviation (SD) of 1.0417 [range 0.9971-1.0875] days, the SD of an overdispersed cycle increases to 4.7803 [range 4.5738-5.0007] days, which represents a 4-fold increment.
The root mean square error (RMSE), concordance correlation coefficient and Pearson correlation coefficient (r) between the observed and predicted values were calculated. The model had an RMSE of 1.6412 days, a precision of 0.7361 and overall accuracy of 0.9871.
Reporting injury, stomach cramps, tender breasts, and flow amount had a significant effect on menstrual cycle length. Although accurate forecast predictions are reported, improvements in the variables collected and enhancements to the model are still needed to improve forecast precision.
Source: de Paula Oliveira T et al. (2021) Modelling menstrual cycle length in athletes using state‑space models. Scientific Reports 11, 16972.